Homepage Search in Blog Collections
نویسندگان
چکیده
A blog homepage consists of many individual blog postings. Current blog search services focus on retrieving postings but there is also a need to identify relevant blog homepages. In this paper, we investigate the properties of blog collections and describe the differences between blog homepage searches and general web page searches. We also introduce and evaluate a variety of approaches for blog homepage search. Our results show that noise reduction and the appropriate combination of techniques can achieve significant improvements in retrieval performance compared to a baseline approach and a traditional named page finding approach for general web pages.
منابع مشابه
Leveraging Collection Structure in Information Retrieval With Applications to Search in Conversational Social Media
Social media collections are becoming increasingly important in the everyday life of Internet users. Recent statistics show that sites hosting social media and community-generated content account for five of the top ten most visited websites in the United States [4], are visited regularly by a broad cross-section of Internet users [61, 67, 115] and host an enormous quantity of information [119,...
متن کاملWhat's New at TREC: Blog and Legal Discovery Search at TREC-2006
This past year, the Text REtrieval Conference (TREC) started two new tracks. One was the Blog track – given a large collection of blog posts and their comments, the task was to locate opinions about products, people, organizations, etc. The other new track was the Legal Track. This track seeks to build test collections for searches that occur during the discovery portion of a lawsuit. The Legal...
متن کاملMath-Net, a model for information and communication systems in sciences
A homepage is the Web entry point and a signpost to a Web site (other common terms therefore are "portals" or "sitemaps"). Web sites of (mathematical) departments consist of collections of interrelated information of the institution. A clear and intuitive structure of the homepage is essential for a user-friendly navigation and search. In fact, however, the structure of department homepages dif...
متن کاملSearch in Conversational Social Media Collections
Community generated content has become increasingly important over the past several years: blogs, Wikipedia, online forums, twitter, Yahoo! Answers, Facebook and many other online communities that foster social interaction have flourished. However, studying “Search in Social Media” as a distinct sub-field of information retrieval poses some questions. Although there is a loose consensus of the ...
متن کاملUniversity of Glasgow at TREC 2009: Experiments with Terrier
In TREC 2009, we extend our Voting Model for the faceted blog distillation, top stories identification, and related entity finding tasks. Moreover, we experiment with our novel xQuAD framework for search result diversification. Besides fostering our research in multiple directions, by participating in such a wide portfolio of tracks, we further develop the indexing and retrieval capabilities of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007